Methods and apparatus to generate audience metrics using third-party privacy-protected cloud environments

ABSTRACT

Methods and apparatus to generate audience metrics using third-party privacy-protected cloud environments. An example apparatus includes a data modifier to obtain a first matrix, the first matrix including first data indicative of entities and embeddings, the entities representative of at least one of search result clicks or videos watched, the embeddings representative of at least one of first classifications of the search result clicks or second classifications of the videos watched, generate a second matrix by reducing the first data in the first matrix to second data that satisfies a size corresponding to an input feature, and store the second matrix in first memory as the input feature, and a model generator to generate a demographic correction model based on the second matrix as the input feature, the demographic correction model to correct demographics corresponding to impressions logged in second memory.

RELATED APPLICATION(S)

This patent arises from a non-provisional patent application that claimsthe benefit of U.S. Provisional Patent Application No. 63/024,260, whichwas filed on May 13, 2020. U.S. Provisional Patent Application No.63/024,260 is hereby incorporated herein by reference in its entirety.Priority to U.S. Provisional Patent Application No. 63/024,260 is herebyclaimed.

FIELD OF THE DISCLOSURE

This disclosure relates generally to monitoring audiences, and, moreparticularly, to methods and apparatus to generate audience metricsusing third-party privacy-protected cloud environments.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating an example system to enable thegeneration of audience measurement metrics based on the merging of datacollected by a database proprietor and an audience measurement entity(AME).

FIG. 2 is a flowchart representative of machine readable instructionswhich may be executed to implement the example data modifier of FIG. 1to reduce the dimensionality of a matrix associated with entities andembeddings.

FIG. 3 is a block diagram of an example processing platform structuredto execute the instructions of FIG. 2.

The figures are not to scale. In general, the same reference numberswill be used throughout the drawing(s) and accompanying writtendescription to refer to the same or like parts.

Unless specifically stated otherwise, descriptors such as “first,”“second,” “third,” etc. are used herein without imputing or otherwiseindicating any meaning of priority, physical order, arrangement in alist, and/or ordering in any way, but are merely used as labels and/orarbitrary names to distinguish elements for ease of understanding thedisclosed examples. In some examples, the descriptor “first” may be usedto refer to an element in the detailed description, while the sameelement may be referred to in a claim with a different descriptor suchas “second” or “third.” In such instances, it should be understood thatsuch descriptors are used merely for identifying those elementsdistinctly that might, for example, otherwise share a same name. As usedherein, “approximately” and “about” refer to dimensions that may not beexact due to manufacturing tolerances and/or other real worldimperfections. As used herein “substantially real time” refers tooccurrence in a near instantaneous manner recognizing there may be realworld delays for computing time, transmission, etc. Thus, unlessotherwise specified, “substantially real time” refers to real time+/−1second.

DETAILED DESCRIPTION

Audience measurement entities (AMEs) usually collect large amounts ofaudience measurement information from their panelists including thenumber of unique audience members for particular media and the number ofimpressions corresponding to each of the audience members. Uniqueaudience size, as used herein, refers to the total number of uniquepeople (e.g., non-duplicate people) who had an impression of (e.g., wereexposed to) a particular media item, without counting duplicate audiencemembers. As used herein, an impression is defined to be an event inwhich a home or individual accesses and/or is exposed to media (e.g., anadvertisement, content, a group of advertisements and/or a collection ofcontent). Impression count, as used herein, refers to the number oftimes audience members are exposed to a particular media item. Theunique audience size associated with a particular media item will alwaysbe equal to or less than the number of impressions associated with themedia item because, while all audience members by definition have atleast one impression of the media, an individual audience member mayhave more than one impression. That is, the unique audience size isequal to the impression count only when every audience member wasexposed to the media only a single time (i.e., the number of audiencemembers equals the number of impressions). Where at least one audiencemember is exposed to the media multiple times, the unique audience sizewill be less than the total impression count because multipleimpressions will be associated with individual audience members. Thus,unique audience size refers to the number of unique people in anaudience (without double counting any person) exposed to media for whichaudience metrics are being generated. Unique audience size may also bereferred to as unique audience, deduplicated audience size, deduplicatedaudience, or audience.

Techniques for monitoring user access to an Internet-accessible media,such as digital television (DTV) media and digital content ratings (DCR)media, have evolved significantly over the years. Internet-accessiblemedia is also known as digital media. In the past, such monitoring wasdone primarily through server logs. In particular, media providersserving media on the Internet would log the number of requests receivedfor their media at their servers. Basing Internet usage research onserver logs is problematic for several reasons. For example, server logscan be tampered with either directly or via zombie programs, whichrepeatedly request media from the server to increase the server logcounts. Also, media is sometimes retrieved once, cached locally and thenrepeatedly accessed from the local cache without involving the server.Server logs cannot track such repeat views of cached media. Thus, serverlogs are susceptible to both over-counting and under-counting errors.

As Internet technology advanced, the limitations of server logs wereovercome through methodologies in which the Internet media to be trackedwas tagged with monitoring instructions. In particular, monitoringinstructions (also known as a media impression request or a beaconrequest) are associated with the hypertext markup language (HTML) of themedia to be tracked. When a client requests the media, both the mediaand the impression request are downloaded to the client. The impressionrequests are, thus, executed whenever the media is accessed, be it froma server or from a cache.

The beacon instructions cause monitoring data reflecting informationabout the access to the media (e.g., the occurrence of a mediaimpression) to be sent from the client that downloaded the media to amonitoring server. Typically, the monitoring server is owned and/oroperated by an AME (e.g., any party interested in measuring or trackingaudience exposures to advertisements, media, and/or any other media)that did not provide the media to the client and who is a trusted thirdparty for providing accurate usage statistics (e.g., The NielsenCompany, LLC). Advantageously, because the beaconing instructions areassociated with the media and executed by the client browser wheneverthe media is accessed, the monitoring information is provided to the AMEirrespective of whether the client is associated with a panelist of theAME. In this manner, the AME is able to track every time a person isexposed to the media on a census-wide or population-wide level. As aresult, the AME can reliably determine the total impression count forthe media without having to extrapolate from panel data collected from arelatively limited pool of panelists within the population. Frequently,such beacon requests are implemented in connection with third-partycookies. Since the AME is a third party relative to the first partyserving the media to the client device, the cookie sent to the AME inthe impression request to report the occurrence of the media impressionof the client device is a third-party cookie. Third-party cookietracking is used by audience measurement servers to track access tomedia by client devices from first-party media servers.

Tracking impressions by tagging media with beacon instructions usingthird-party cookies is insufficient, by itself, to enable an AME toreliably determine the unique audience size associated with the media ifthe AME cannot identify the individual user associated with thethird-party cookie. That is, the unique audience size cannot bedetermined because the collected monitoring information does notuniquely identify the person(s) exposed to the media. Under suchcircumstances, the AME cannot determine whether two reported impressionsare associated with the same person or two separate people. The AME mayset a third-party cookie on a client device reporting the monitoringinformation to identify when multiple impressions occur using the samedevice. However, cookie information does not indicate whether the sameperson used the client device in connection with each media impression.Furthermore, the same person may access media using multiple differentdevices that have different cookies so that the AME cannot directlydetermine when two separate impressions are associated with the sameperson or two different people.

Furthermore, the monitoring information reported by a client deviceexecuting the beacon instructions does not provide an indication of thedemographics or other user information associated with the person(s)exposed to the associated media. To at least partially address thisissue, the AME establishes a panel of users who have agreed to providetheir demographic information and to have their Internet browsingactivities monitored. When an individual joins the panel, that personprovides corresponding detailed information concerning the person'sidentity and demographics (e.g., gender, race, income, home location,occupation, etc.) to the AME. The AME sets a cookie on the panelistcomputer that enables the AME to identify the panelist whenever thepanelist accesses tagged media and, thus, sends monitoring informationto the AME. Additionally or alternatively, the AME may identify thepanelists using other techniques (independent of cookies) by, forexample, prompting the user to login or identify themselves. While AMEsare able to obtain user-level information for impressions from panelists(e.g., identify unique individuals associated with particular mediaimpressions), most of the client devices providing monitoringinformation from the tagged pages are not panelists. Thus, the identityof most people accessing media remains unknown to the AME such that itis necessary for the AME to use statistical methods to imputedemographic information based on the data collected for panelists to thelarger population of users providing data for the tagged media. However,panel sizes of AMEs remain small compared to the general population ofusers.

There are many database proprietors operating on the Internet. Thesedatabase proprietors provide services to large numbers of subscribers.In exchange for the provision of services, the subscribers register withthe database proprietors. Examples of such database proprietors includesocial network sites (e.g., Facebook, Twitter, My Space, etc.),multi-service sites (e.g., Yahoo!, Google, Axiom, Catalina, etc.),online retailer sites (e.g., Amazon.com, Buy.com, etc.), creditreporting sites (e.g., Experian), streaming media sites (e.g., YouTube,Hulu, etc.), etc. These database proprietors set cookies and/or otherdevice/user identifiers on the client devices of their subscribers toenable the database proprietors to recognize their subscribers whentheir subscribers visit website(s) on the Internet domains of thedatabase proprietors.

The protocols of the Internet make cookies inaccessible outside of thedomain (e.g., Internet domain, domain name, etc.) on which they wereset. Thus, a cookie set in, for example, the YouTube.com domain (e.g., afirst party) is accessible to servers in the YouTube.com domain, but notto servers outside that domain. Therefore, although an AME (e.g., athird party) might find it advantageous to access the cookies set by thedatabase proprietors, they are unable to do so. However, techniques havebeen developed that enable an AME to leverage media impressioninformation collected in association with demographic information insubscriber databases of database proprietors to collect more extensiveInternet usage (e.g., beyond the limited pool of individualsparticipating in an AME panel) by extending the impression requestprocess to encompass partnered database proprietors and by using suchpartners as interim data collectors. In particular, this task isaccomplished by structuring the AME to respond to impression requestsfrom clients (who may not be a member of an audience measurement paneland, thus, may be unknown to the AME) by redirecting the clients fromthe AME to a database proprietor, such as a social network sitepartnered with the AME, using an impression response. Such a redirectioninitiates a communication session between the client accessing thetagged media and the database proprietor. For example, the impressionresponse received from the AME may cause the client to send a secondimpression request to the database proprietor along with a cookie set bythat database proprietor. In response to receiving this impressionrequest, the database proprietor (e.g., Facebook) can access the cookieit has set on the client to thereby identify the client based on theinternal records of the database proprietor.

In the event the client corresponds to a subscriber of the databaseproprietor (as determined from the cookie associated with the client),the database proprietor logs/records a database proprietor demographicimpression in association with the client/user. As used herein, ademographic impression is an impression that can be matched toparticular demographic information of a particular subscriber orregistered users of the services of a database proprietor. The databaseproprietor has the demographic information for the particular subscriberbecause the subscriber would have provided such information when settingup an account to subscribe to the services of the database proprietor.

Sharing of demographic information associated with subscribers ofdatabase proprietors enables AMEs to extend or supplement their paneldata with substantially reliable demographics information from externalsources (e.g., database proprietors), thus extending the coverage,accuracy, and/or completeness of their demographics-based audiencemeasurements. Such access also enables the AME to monitor persons whowould not otherwise have joined an AME panel. Any web service providerhaving a database identifying demographics of a set of individuals maycooperate with the AME. Such web service providers may be referred to as“database proprietors” and include, for example, wireless servicecarriers, mobile software/service providers, social media sites (e.g.,Facebook, Twitter, MySpace, etc.), online retailer sites (e.g.,Amazon.com, Buy.com, etc.), multi-service sites (e.g., Yahoo!, Google,Experian, etc.), and/or any other Internet sites that collectdemographic data of users and/or otherwise maintain user registrationrecords. The use of demographic information from disparate data sources(e.g., high-quality demographic information from the panels of anaudience measurement entity and/or registered user data of databaseproprietors) results in improved reporting effectiveness of metrics forboth online and offline advertising campaigns.

The above approach to generating audience metrics by an AME depends uponthe beacon requests (or tags) associated with the media to be monitoredto enable an AME to obtain census wide impression counts (e.g.,impressions that include the entire population exposed to the mediaregardless of whether the audience members are panelists of the AME).Further, the above approach also depends on third-party cookies toenable the enrichment of the census impressions with demographicinformation from database proprietors. However, in more recent years,there has been a movement away from the use of third-party cookies bythird parties. Thus, while media providers (e.g., database proprietors)may still use first-party cookies to collect first-party data, theelimination of third-party cookies prevents the tracking of Internetmedia by AMEs (outside of client devices associated with panelists forwhich the AME has provided a meter to track Internet usage behavior).Furthermore, independent of the use of cookies, some databaseproprietors are moving towards the elimination of third party impressionrequests or tags (e.g., redirect instructions) embedded in media (e.g.,beginning in 2020, third-party tags will no longer be allowed onYoutube.com and other Google Video Partner (GVP) sites). As technologymoves in this direction, AMEs (e.g., third parties) will no longer beable to track census wide impressions of media in the manner they havein the past. Furthermore, AMEs will no longer be able to send a redirectrequest to a client accessing media to cause a second impression requestto a database proprietor to associate the impression with demographicinformation. Thus, the only Internet media monitoring that AMEs will beable to directly perform in such a system will be with panelists thathave agreed to be monitored using different techniques that do notdepend on third-party cookies and/or tags.

Examples disclosed herein overcome at least some of the limitations thatarise out of the elimination of third-party cookies and/or third-partytags by enabling the merging of high-quality demographic informationfrom the panels of an AME with media impression data that continues tobe collected by database proprietors. As mentioned above, whilethird-party cookies and/or third-party tags may be eliminated, databaseproprietors that provide and/or manage the delivery of media accessedonline are still able to track impressions of the media (e.g., viafirst-party cookies and/or first-party tags). Furthermore, databaseproprietors are still able to associate demographic information with theimpressions whenever the impressions can be matched to a particularsubscriber of the database proprietor for which demographic informationhas been collected (e.g., when the user is registered with the databaseproprietor). In some examples, the merging of AME panel data anddatabase proprietor impressions data is merged in a privacy-protectedcloud environment maintained by the database proprietor. The merged datamay include entities for each user. These entities may be top searchresult click entities and/or video watch entities during a period oftime. In examples disclosed herein, a search result click entity is aninteger identifier that represents a search term entered by a user. Inexamples disclosed herein, a video watch entity is an integer identifierthat represents a video viewed by a user. In examples disclosed herein,integer identifiers map to a knowledge graph of all entities for thesearch result clicks and/or videos watched. Additionally, embeddings maybe provided for each such entity. In examples disclosed herein, anembedding is a classification of an entity. In examples disclosedherein, classifications are numerical representation (e.g., a vectorarray of values) of some class of similar objects, images, words, andthe like. Example classifications include classifications of Internetsearches requested by a user (e.g., corresponding to a top search resultclick entity) and classifications of media accessed by a user (e.g.,corresponding to a video watch entity). In one example, the merged datafor a user is provided in an entity-embeddings matrix.

Examples disclosed herein may be used to reduce the dimensionality ofentity-embeddings matrices corresponding to users. In examples disclosedherein, an entity-embeddings matrix is used to represent relationshipsbetween embeddings corresponding to entities. For example, anentity-embeddings matrix is reduced to a more manageable size to be usedas an input feature to generate demographic correction models asdisclosed herein. In some examples, a reduction technique is to selecttop m entities and top n embeddings. Additionally or alternatively, inother examples, a reduction technique is to calculate a weighted averageof the embeddings across the entities. Additionally or alternatively, inother examples, a reduction technique is to reduce the dimension ofembeddings by using a single value to represent the different embeddingdimensions. In this manner, the reduced entity-embeddings matricesgenerated using techniques disclosed herein may be used to improvecomputers, computer performance, and/or computer-generated data byproviding data of a manageable size to be used as an input feature fordemographic correction model generation.

More particularly, FIG. 1 is a block diagram illustrating an examplesystem 100 to enable the generation of audience measurement metricsbased on the merging of data collected by a database proprietor 102 andan AME 104. More particularly, in some examples, the data includes AMEpanel data (that includes media impressions for panelists that areassociated with high-quality demographic information collected by theAME 104) and database proprietor impressions data (which may be enrichedwith demographic and/or other information available to the databaseproprietor 102). In the illustrated example, these disparate sources ofdata are combined within a privacy-protected cloud environment 106managed and/or maintained by the database proprietor 102. Theprivacy-protected cloud environment 106 is a cloud-based environmentthat enables media providers (e.g., advertisers and/or contentproviders) and third parties (e.g., the AME 104) to input and combinetheir data with data from the database proprietor 102 inside a datawarehouse or data store that enables efficient big data analysis. Thecombining of data from different parties (e.g., different Internetdomains) presents risks to the privacy of the data associated withindividuals represented by the data from the different parties.Accordingly, the privacy-protected cloud environment 106 is establishedwith privacy constraints that prevent any associated party (includingthe database proprietor 102) from accessing private informationassociated with particular individuals. Rather, any data extracted fromthe privacy-protected cloud environment 106 following a big dataanalysis and/or query is limited to aggregated information. A specificexample of the privacy-protected cloud environment 106 is the Ads DataHub (ADH) developed by Google.

As used herein, a media impression is defined as an occurrence of accessand/or exposure to media 108 (e.g., an advertisement, a movie, a movietrailer, a song, a web page banner, etc.). Examples disclosed herein maybe used to monitor for media impressions of any one or more media types(e.g., video, audio, a web page, an image, text, etc.). In examplesdisclosed herein, the media 108 may be primary content and/oradvertisements. Examples disclosed herein are not restricted for usewith any particular type of media. On the contrary, examples disclosedherein may be implemented in connection with tracking impressions formedia of any type or form in a network.

In the illustrated example of FIG. 1, content providers and/oradvertisers distribute the media 108 via the Internet to users thataccess websites and/or online television services (e.g., web-based TV,Internet protocol TV (IPTV), etc.). For purposes of explanation,examples disclosed herein are described assuming the media 108 is anadvertisement that may be provided in connection with particular contentof primary interest to a user. In some examples, the media 108 is servedby media servers managed by and/or associated with the databaseproprietor 102 that manages and/or maintains the privacy-protected cloudenvironment 106. For example, the database proprietor 102 may be Google,and the media 108 corresponds to ads served with videos accessed viaYoutube.com and/or via other Google video partners (GVPs). Moregenerally, in some examples, the database proprietor 102 includescorresponding database proprietor servers that can serve media 108 toindividuals via client devices 110. In the illustrated example of FIG.1, the client devices 110 may be stationary or portable computers,handheld computing devices, smart phones, Internet appliances, smarttelevisions, and/or any other type of device that may be connected tothe Internet and capable of presenting media. For purposes ofexplanation, the client devices 110 of FIG. 1 include panelist clientdevices 112 and non-panelist client devices 114 to indicate that atleast some individuals that access and/or are exposed to the media 108correspond to panelists who have provided detailed demographicinformation to the AME 104 and have agreed to enable the AME 104 totrack their exposure to the media 108. In many situations, otherindividuals who are not panelists will also be exposed to the media 108(e.g., via the non-panelist client devices 114). Typically, the numberof non-panelist audience members for a particular media item will besignificantly greater than the number of panelist audience members. Insome examples, the panelist client devices 112 may include and/orimplement an audience measurement meter 115 that captures theimpressions of media 108 accessed by the panelist client devices 112(along with associated information) and reports the same to the AME 104.In some examples, the audience measurement meter 115 may be a separatedevice from the panelist client device 112 used to access the media 108.

In some examples, the media 108 is associated with a unique impressionidentifier (e.g., a consumer playback nonce (CPN)) generated by thedatabase proprietor 102. In some examples, the impression identifierserves to uniquely identify a particular impression of the media 108.Thus, even though the same media 108 may be served multiple times, eachtime the media 108 is served the database proprietor 102 will generate anew and different impression identifier so that each impression of themedia 108 can be distinguished from every other impression of the media.In some examples, the impression identifier is encoded into a uniformresource locator (URL) used to access the primary content (e.g., aparticular YouTube video) along with which the media 108 (as anadvertisement) is served. In some examples, with the impressionidentifier (e.g., CPN) encoded into the URL associated with the media108, the audience measurement meter 115 extracts the identifier at thetime that a media impression occurs so that the AME 104 is able toassociate a captured impression with the impression identifier.

In some examples, the meter 115 may not be able to obtain the impressionidentifier (e.g., CPN) to associate with a particular media impression.For instance, in some examples where the panelist client device 112 is amobile device, the meter 115 collects a mobile advertising identifier(MAID) and/or an identifier for advertisers (IDFA) that may be used touniquely identify client devices 110 (e.g., the panelist client devices112 being monitored by the AME 104). In some examples, the meter 115reports the MAID and/or IDFA for the particular device associated withthe meter 115 to the AME 104. The AME 104, in turn, provides the MAIDand/or IDFA to the database proprietor 102 in a double blind exchangethrough which the database proprietor 102 provides the AME 104 with theimpression identifiers (e.g., CPNs) associated with the client device110 identified by the MAID and/or IDFA. Once the AME 104 receives theimpression identifiers for the client device 110 (e.g., a particularpanelist client device 112), the impression identifiers are associatedwith the impressions previously collected in connection with the device.

In the illustrated example, the database proprietor 102 logs each mediaimpression occurring on any of the client devices 110 within theprivacy-protected cloud environment 106. In some examples, logging animpression includes logging the time the impression occurred and thetype of client device 110 (e.g., whether a desktop device, a mobiledevice, a tablet device, etc.) on which the impression occurred.Further, in some examples, impressions are logged along with theimpression's unique impression identifier. In this example, theimpressions and associated identifiers are logged in a campaignimpressions database 116. The campaign impressions database 116 storesall impressions of the media 108 regardless of whether any particularimpression was detected from a panelist client device 112 or anon-panelist client device 114. Furthermore, the campaign impressionsdatabase 116 stores all impressions of the media 108 regardless ofwhether the database proprietor 102 is able to match any particularimpression to a particular subscriber of the database proprietor 102. Asmentioned above, in some examples, the database proprietor 102identifies a particular user (e.g., subscriber) associated with aparticular media impression based on a cookie stored on the clientdevice 110. In some examples, the database proprietor 102 associates aparticular media impression with a user that was signed into the onlineservices of the database proprietor 102 at the time the media impressionoccurred. In some examples, in addition to logging such impressions andassociated identifiers in the campaign impressions database 116, thedatabase proprietor 102 separately logs such impressions in a matchableimpressions database 118. As used herein, a matchable impression is animpression that the database proprietor 102 is able to match to at leastone of a particular subscriber (e.g., because the impression occurred ona client device 110 on which a user was signed into the databaseproprietor 102) or a particular client device 110 (e.g., based on afirst-party cookie of the database proprietor 102 detected on the clientdevice 110). In some examples, if the database proprietor 102 cannotmatch a particular media impression (e.g., because no user was signed inat the time the media impression occurred and there is no recognizablecookie on the associated client device 110) the impressions is omittedfrom the matchable impressions database 118 but is still logged in thecampaign impressions database 116.

As indicated above, the matchable impressions database 118 includesmedia impressions (and associated unique impression identifiers) thatthe database proprietor 102 is able to match to a particular user thathas registered with the database proprietor 102. In some examples, thematchable impressions database 118 also includes user-based covariatesthat correspond to the particular user to which each impression in thedatabase was matched. As used herein, a user-based covariate refers toany item(s) of information collected and/or generated by the databaseproprietor 102 that can be used to identify, characterize, quantify,and/or distinguish particular users and/or their associated behavior.For example, user-based covariates may include the name, age, and/orgender of the user (and/or any other demographic information about theuser) collected at the time the user registered with the databaseproprietor 102, and/or the relative frequency with which the user usesthe different types of client device 110, the number of media items theuser has accessed during a most recent period of time (e.g., the last 30days), the search terms entered by the user during a most recent periodof time (e.g., the last 30 days), feature embeddings (numericalrepresentations) of classifications of videos viewed and/or searchesentered by the user, etc. As mentioned above, the matchable database 118also includes impressions matched to particular client devices 110(based on first-party cookies), even when the impressions cannot bematched to particular users (based on the users being signed in at thetime). In some such examples, the impressions matched to particularclient devices 110 are treated as distinct users within the matchabledatabase 118. However, as no particular user can be identified, suchimpressions in the matchable database 118 will not be associated withany user-based covariates.

Although only one campaign impressions database 116 is shown in theillustrated example, the privacy-protected cloud environment 106 mayinclude any number of campaign impressions databases 116, with eachdatabase storing impressions corresponding to different media campaignsassociated with one or more different advertisers (e.g., productmanufacturers, service providers, retailers, merchants, advertisementservers, etc.). In other examples, a single campaign impressionsdatabase 116 may store the impressions associated with multipledifferent campaigns. In some such examples, the campaign impressionsdatabase 116 may store a campaign identifier in connection with eachimpression to identify the particular campaign to which the impressionis associated. Similarly, in some examples, the privacy-protected cloudenvironment 106 may include one or more matchable impressions databases118 as appropriate. Further, in some examples, the campaign impressionsdatabase 116 and the matchable impressions database 118 may be combinedand/or represented in a single database.

In the illustrated example of FIG. 1, impressions occurring on theclient devices 110 are shown as being reported (e.g., via networkcommunications) directly to both the campaign impressions database 116and the matchable impressions database 118. However, this should not beinterpreted as necessarily requiring multiple separate networkcommunications from the client devices 110 to the database proprietor102. Rather, in some examples, notifications of impressions arecollected from a single network communication from the client device110, and the database proprietor 102 then populates both the campaignimpressions database 116 and the matchable impressions database 118. Insome examples, the matchable impressions database 118 is generated basedon an analysis of the data in the campaign impressions database 116.Regardless of the particular process by which the two databases 116, 118are populated with logged impressions, in some examples, the user-basedcovariates included in the matchable impressions database 118 may becombined with the logged impressions in the campaign impressionsdatabase 116 and stored in an enriched impressions database 120. Thus,the enriched impressions database includes all (e.g., census wide)logged impressions of the media 108 for the relevant advertisingcampaign and also includes all available user-based covariatesassociated with each of the logged impressions that the databaseproprietor 102 was able to match to a particular user.

As shown in the illustrated example, whereas the database proprietor 102is able to collect impressions from both panelist client devices 112 andnon-panelist client devices 114, the AME 104 is limited to collectingimpressions from panelist client devices 112. In some examples, the AME104 also collects the impression identifier associated with eachcollected media impression so that the collected impressions may bematched with the impressions collected by the database proprietor 102 asdescribed further below. In the illustrated example, the impressions(and associated impression identifiers) of the panelists are stored inan AME panel data database 122 that is within an AME first party datastore 124 in an AME proprietary cloud environment 126. In some examples,the AME proprietary cloud environment 126 is a cloud-based storagesystem (e.g., a Google Cloud Project) provided by the databaseproprietor 102 that includes functionality to enable interfacing withthe privacy-protected cloud environment 106 also maintained by thedatabase proprietor 102. As mentioned above, the privacy-protected cloudenvironment 106 is governed by privacy constraints that prevent anyparty (with some limited exceptions for the database proprietor 102)from accessing private information associated with particularindividuals. By contrast, the AME proprietary cloud environment 126 isindicated as proprietary because it is exclusively controlled by the AMEsuch that the AME has full control and access to the data withoutlimitation. While some examples involve the AME proprietary cloudenvironment 126 being a cloud-based system that is provided by thedatabase proprietor 102, in other examples, the AME proprietary cloudenvironment 126 may be provided by a third party distinct from thedatabase proprietor 102.

While the AME 104 is limited to collected impressions (and associatedidentifiers) from only panelists (e.g., via the panelist client devices112), the AME 104 is able to collect panel data that is much more robustthan merely media impressions. As mentioned above, the panelist clientdevices 112 are associated with users that have agreed to participate ona panel of the AME 104. Participation in a panel includes the provisionof detailed demographic information about the panelist and/or allmembers in the panelist's household. Such demographic information mayinclude age, gender, race, ethnicity, education, employment status,income level, geographic location of residence, etc. In addition to suchdemographic information, which may be collected at the time a userenrolls as a panelist, the panelist may also agree to enable the AME 104to track and/or monitor various aspects of the user's behavior. Forexample, the AME 104 may monitor panelists' Internet usage behaviorincluding the frequency of Internet usage, the times of day of suchusage, the websites visited, and the media exposed to (from which themedia impressions are collected).

AME panel data (including media impressions and associated identifiers,demographic information, and Internet usage data) is shown in FIG. 1 asbeing provided directly to the AME panel data database 122 from thepanelist client devices 112. However, in some examples, there may be oneor more intervening operations and/or components that collect and/orprocess the collected data before it is stored in the AME panel datadatabase 122. For instance, in some examples, impressions are initiallycollected and reported to a separate server and/or database that isdistinct from the AME proprietary cloud environment 126. In some suchexamples, this separate server and/or database may not be a cloud-basedsystem. Further, in some examples, such a non-cloud-based system mayinterface directly with the privacy-protected cloud environment 106 suchthat the AME proprietary cloud environment 126 may be omitted entirely.

In some examples, there may be multiple different techniques and/ormethodologies used to collect the AME panel data that depends on theparticular circumstances involved. For example, different monitoringtechniques and/or different types of audience measurement meters 115 maybe employed for media accessed via a desktop computer relative to themedia accessed via a mobile computing device. In some examples, theaudience measurement meter 115 may be implemented as a softwareapplication that panelists agree to install on their devices to monitorall Internet usage activity on the respective devices. In some examples,the meter 115 may prompt a user of a particular device to identifythemselves so that the AME 104 can confirm the identity of the user(e.g., whether it was the mother or daughter in a panelist household).In some examples, prompting a user to self-identify may be consideredoverly intrusive. Accordingly, in some such examples, the circumstancessurrounding the behavior of the user of a panelist client device 112(e.g., time of day, type of content being accessed, etc.) may beanalyzed to infer the identity of the user to some confidence level(e.g., the accessing of children's content in the early afternoon wouldindicate a relatively high probability that a child is using the deviceat that point in time). In some examples, the audience measurement meter115 may be a separate hardware device that is in communication with aparticular panelist client device 112 and enabled to monitor theInternet usage of the panelist client device 112.

In some examples, the processes and/or techniques used by the AME 104 tocapture panel data (including media impressions and who in particularwas exposed to the media) can differ depending on the nature of thepanelist client device 112 through which the media was accessed. Forinstance, in some examples, the identity of the individual using theclient device 112 may be based on the individual responding to a promptto self-identify. In some examples, such prompts are limited to desktopclient devices because such a prompt is viewed as overly intrusive on amobile device. However, without specifically prompting a user of amobile device to self-identify, there often is no direct way todetermine whether the user is the primary user of the device (e.g., theowner of the device) or someone else (e.g., a child of the primaryuser). Thus, there is the possibility of misattribution of mediaimpressions within the panel data collected using mobile devices. Insome examples, to overcome the issue of misattribution in the paneldata, the AME 104 may develop a machine learning model that can predictthe true user of a mobile device (or any device for that matter) basedon information that the AME 104 does know for certain and/or has accessto. For example, inputs to the machine learning model may include thecomposition of the panelist household, the type (e.g., genre and/orcategory) of the content, the daypart or time of day when the contentwas accessed, etc. In some examples, the truth data used to generate andvalidate such a model may be collected through field surveys in whichthe above input features are tracked and/or monitored for a subset ofpanelists that have agreed to be monitored in this manner (which is moreintrusive than the typical passive monitoring of content accessed viamobile devices).

As mentioned above, in some examples, the AME panel data (stored in theAME panel data database 122) is merged with the database proprietorimpressions data (stored in the matchable impressions database 118)within the privacy-protected cloud environment 106 to take advantage ofthe combination of the disparate sets of data to generate more robustand/or reliable audience measurement metrics. In particular, thedatabase proprietor impressions data provides the advantage of volume.That is, the database proprietor impressions data corresponds to a muchlarger number of impressions than the AME panel data because thedatabase proprietor impressions data includes census wide impressioninformation that includes all impressions collected from both thepanelist client devices 112 (associated with a relatively small pool ofaudience members) and the non-panelist client devices 114. The AME paneldata provides the advantage of high-quality demographic data for astatistically significant pool of audience members (e.g., panelists)that may be used to correct for errors and/or biases in the databaseproprietor impressions data.

One source of error in the database proprietor impressions data is thatthe demographic information for matchable users collected by thedatabase proprietor 102 during user registration may not be truthful. Inparticular, in some examples, many database proprietors impose agerestrictions on their user accounts (e.g., a user must be at least 13years of age, at least 18 years of age, etc.). However, when a personregisters with the database proprietor 102, the user typicallyself-declares their age and may, therefore, lie about their age (e.g.,an 11 year old may say they are 18 to bypass the age restrictions for auser account). Independent of age restrictions, a particular user maychoose to enter an incorrect age for any other reason or no reason atall (e.g., a 44 year old may choose to assert they are only 25). Where adatabase proprietor 102 does not verify the self-declared age of users,there is a relatively high likelihood that the ages of at least someregistered users of the database proprietor stored in the matchableimpressions database 118 (as a particular user-based covariate) areinaccurate. Further, it is possible that other self-declared demographicinformation (e.g., gender, race, ethnicity, income level, etc.) may alsobe falsified by users during registration. As described further below,the AME panel data (which contains reliable demographic informationabout the panelists) can be used to correct for inaccurate demographicinformation in the database proprietor impressions data.

Another source of error in the database proprietor impressions data isbased on the concept of misattribution, which arises in situations wheremultiple different people use the same client device 110 to accessmedia. In some examples, the database proprietor 102 associates aparticular impression to a particular user based on the user beingsigned into a platform provided by the database proprietor. For example,if a particular person signs into their Google account and beginswatching a YouTube video on a particular client device 110, that personwill be attributed with an impression for an ad served during the videobecause the person was signed in at the time. However, there may beinstances where the person finishes using the client device 110 but doesnot sign out of his or her Google account. Thereafter, a seconddifferent person (e.g., a different member in the family of the firstperson) begins using the client device 110 to view another YouTubevideo. Although the second person is now accessing media via the clientdevice 110, ad impressions during this time will still be attributed tothe first person because the first person is the one who is stillindicated as being signed in. Thus, there is likely to be circumstanceswhere the actual person exposed to media 108 is misattributed to adifferent registered user of the database proprietor 102. The AME paneldata (which includes an indication of the actual person using thepanelist client devices 112 at any given moment) can be used to correctfor misattribution in the demographic information in the databaseproprietor impressions data. As mentioned above, in some situations, theAME panel data may itself include misattribution errors. Accordingly, insome examples, the AME panel data may first be corrected formisattribution before the AME panel data is used to correctmisattribution in the database proprietor impressions data. An examplemethodology to correct for misattribution in the database proprietorimpressions data is described in Singh et al., U.S. Pat. No. 10,469,903,which is hereby incorporated herein by reference in its entirety.

Another problem with the database proprietor impressions data is that ofnon-coverage. Non-coverage refers to impressions recorded by thedatabase proprietor 102 that cannot be matched to a particularregistered user of the database proprietor 102. The inability of thedatabase proprietor 102 to match a particular impression to a particularuser can occur for several reasons including that the user is not signedin at the time of the media impression, that the user has notestablished an account with the database proprietor 102, that the userhas enabled Limited Ad Tracking (LAT) to prevent the user account frombeing associated with ad impressions, or that the content associatedwith the media being monitored corresponds to children's content (forwhich user-based tracking is not performed). While the inability of thedatabase proprietor 102 to match and assign a particular impression to aparticular user is not necessarily an error in the database proprietorimpressions data, it does undermine the ability to reliably estimate thetotal unique audience size for (e.g., the number of unique individualsthat were exposed to) a particular media item. For example, assume thatthe database proprietor 102 records a total of 11,000 impressions formedia 108 in a particular advertising campaign. Further assume that ofthose 11,000 impressions, the database proprietor 102 is able to match10,000 impressions to a total of 5,000 different users (e.g., each userwas exposed to the media on average 2 times) but is unable to match theremaining 1,000 impressions to particular users. Relying solely on thedatabase proprietor impressions data, in this example, there is no wayto determine whether the remaining 1,000 impressions should also beattributed to the 5,000 users already exposed at least once to the media108 (for a total audience size of 5,000 people) or if one or more of theremaining 1,000 impressions should be attributed to other users notamong the 5,000 already identified (for a total audience size of up to6,000 people (if every one of the 1,000 impressions was associated witha different person not included in the matched 5,000 users)). In someexamples disclosed herein, the AME panel data can be used to estimatethe distribution of impressions across different users associated withthe non-coverage portion of impressions in the database proprietorimpressions data to thereby estimate a total audience size for therelevant media 108.

Another confounding factor to the estimation of the total uniqueaudience size for media based on the database proprietor impressionsdata is the existence of multiple user accounts of a single user. Moreparticular, in some situations a particular individual may establishmultiple accounts with the database proprietor 102 for differentpurposes (e.g., a personal account, a work account, a joint accountshared with other individuals, etc.). Such a situation can result in alarger number of different users being identified as audience members tomedia 108 than the actual number of individuals exposed to the media108. For example, assume that a particular person registers three useraccounts with the database proprietor 102 and is exposed to the media108 once while signed into each of the three different accounts for atotal of three impressions. In this scenario, the database proprietor102 would match each impression to a different user based on thedifferent user accounts making it appear that three different peoplewere exposed to the media 108 when, in fact, only one person was exposedto the media three different times. Examples disclosed herein use theAME panel data in conjunction with the database proprietor impressionsdata to estimate an actual unique audience size from the potentiallyinflated number of apparently unique users exposed to the media 108.

In the illustrated example of FIG. 1, the AME panel data is merged withthe database proprietor impressions data by an example data matchinganalyzer 128. In some examples, the data matching analyzer 128implements an application programming interface (API) that takes thedisparate datasets and matches users in the database proprietorimpressions data with panelists in the AME panel data. In some examples,users are matched with panelists based on the unique impressionidentifiers (e.g., CPNs) collected in connection with the mediaimpressions logged by both the database proprietor 102 and the AME 104.The combined data is stored in an intermediary merged data database 130within an AME privacy-protected data store 132. The data in theintermediary merged data database 130 is referred to as “intermediary”because it is at an intermediate stage in the processing because itincludes AME panel data that has been enhanced and/or combined with thedatabase proprietor impressions data, but has not yet be corrected oradjusted to account for the sources of error and/or bias in the databaseproprietor impressions data as outlined above.

In some examples, the AME intermediary merged data is analyzed by anadjustment factor analyzer 134 to calculate adjustment or calibrationfactors that may be stored in an adjustment factors database 136 withinan AME output data store 138 of the AME proprietary cloud environment126. In some examples, the adjustment factor analyzer 134 calculatesdifferent types of adjustment factors to account for different types oferrors and/or biases in the database proprietor impressions data. Forinstance, a multi-account adjustment factor corrects for the situationof a single user accessing media using multiple different user accountsassociated with the database proprietor 102. A signed-out adjustmentfactor corrects for non-coverage associated with users that access mediawhile signed out of their account associated with the databaseproprietor 102 (so that the database proprietor 102 is unable toassociate the impression with the users). In some examples, theadjustment factor analyzer 134 is able to directly calculate themulti-account adjustment factor and the signed-out adjustment factor ina deterministic manner.

While the multi-account adjustment factors and the signed-out adjustmentfactors may be deterministically calculated, correcting for falsified orotherwise incorrect demographic information (e.g., incorrectlyself-declared ages) of registered users of the database proprietor 102cannot be solved in such a direct and deterministic manner. Rather, insome examples, a machine learning model is developed to analyze andpredict the correct ages of registered users of the database proprietor102. Specifically, as shown in FIG. 1, the privacy-protected cloudenvironment 106 implements a model generator 140 to generate ademographic correction model using the AME intermediary merged data(stored in the AME intermediary merged data database 130) as inputs.More particularly, in some examples, self-declared demographics (e.g.,the self-declared age) of users of the database proprietor 102, alongwith other covariates associated with the users, are used as the inputvariables or features used to train a model to predict the correctdemographics (e.g., correct age) of the users as validated by the AMEpanel data, which serves as the truth data or training labels for themodel generation. In some examples, different demographic correctionmodel(s) may be developed to correct for different types of demographicinformation that needs correcting. For instance, in some examples, afirst model can be used to correct the self-declared age of users of thedatabase proprietor 102 and a second model can be used to correct theself-declared gender of the users. Once the model(s) have been trainedand validated based on the AME panel data, the model(s) are stored in ademographic correction models database 142.

As mentioned above, there are many different types of covariatescollected and/or generated by the database proprietor 102. In someexamples, the covariates provided by the database proprietor 102 mayinclude a certain number (e.g., 100) of the top search result clickentities and/or video watch entities for every user during a most recentperiod of time (e.g., for the last month). As an example, the covariatesmay include the top 100 search result click entities and video watchentities (e.g., YouTube video watch entities) for each user for the lastmonth. In examples disclosed herein, entities are represented as integeridentifiers (IDs) that map to a knowledge graph of all entities for thesearch result clicks and/or videos watched. For example, the IDs may mapto freebase knowledge graph entity IDs. That is, as used in thiscontext, an entity corresponds to a particular node in a knowledge graphmaintained by the database proprietor 102. In some examples, the totalnumber of unique IDs in the knowledge graph may number in the tens ofmillions. More particularly, for example, YouTube videos are classifiedacross roughly 20 million unique video entity IDs and Google searchresults are classified across roughly 25 million unique search resultentity IDs. In addition to the top search result click entities and/orvideo watch entities, the database proprietor 102 may also provideembeddings for these entities. An embedding is a numericalrepresentation (e.g., a vector array of values) of some class of similarobjects, images, words, and the like. For example, a particular userthat frequently searches for and/or views cat videos may be associatedwith a feature embedding representative of the class corresponding tocats. Thus, feature embeddings translate relatively high dimensionalvectors of information (e.g., text strings, images, videos, etc.) into alower dimensional space to enable the classification of different butsimilar objects.

In some examples, multiple embeddings may be associated with each searchresult click entity and/or video watch entity. Accordingly, assuming thetop 100 search result entities and video watch entities are providedamong the covariates and that 16 dimension embeddings are provided foreach such entity, this results in a 100×16 matrix of values for eachuser, which may be too much data to process during generation of thedemographic correction models as described above. Accordingly, in someexamples, the privacy-protected cloud environment 106 implements a datamodifier 135 to reduce the dimensionality of the matrix. The reducedmatrix may be a more manageable size to be used as an input feature forthe model generator 140 to generate the demographic correction model.The demographic correction model may be used to predict demographics(e.g., age, gender) of actual users associated with media impressions.

In some examples, reduction in the entity-embeddings matrix isaccomplished by the data modifier 135 selecting the top m entities(e.g., knowledge graph entities) and the top n embeddings and convertingthe 2-dimensional matrix to a 1-dimensional array. In such examples, topm entities and the top n embeddings are both hyperparameters: mrepresents the number of top entities to select from the matrix, and nrepresents the number of top embeddings to select from the matrix. Forexample, Table 1 below shows the matrix including entries for 4different embeddings across 3 different knowledge graph entities (e.g.,corresponding to a 3×4 matrix comparable to the 100×16 matrix discussedabove) for a user (e.g., user_id=1).

TABLE 1 user_id ent emb_l emb_2 emb_3 emb_4 1 1 1 2 3 4 1 2 5 6 7 8 1 39 10 11 12Assuming that m=2 (e.g., the top 2 entities are to be selected) and n=2(e.g., the top 2 embeddings are to be selected), the 3×4 matrixrepresented in Table 1 may be converted to 4 columns (e.g., the 100×16matrix discussed above converted to m×n). The 4 columns are a 4-elementarray (e.g., m=2 and n=2) as represented in Table 2 below.

TABLE 2 user_id ent_1_emb_l ent_1_emb_2 ent_1_emb_l ent_2_emb_2 1 1 2 56

Additionally or alternatively, in some examples, the dimension ofentities is reduced by the data modifier 135 calculating a weightedaverage of the embeddings across the entities. In such examples, weightsand a scale for softmax weights (e.g., a softmax weights scale) are bothhyperparameters to calculate the weighted average of the embeddingsacross the entities. For example, Table 3 below represents the originalweight definitions for every entity:

TABLE 3 equal ranking ent id weights weights weights softmax weightsscale = i 1 w1 =1/3 =3/sum(3, 2, 1) =exp(3*i)/sum(exp(3*i), exp(2*i),exp(1*i)) 2 w2 =1/3 =2/sum(3, 2, 1) =exp(2*i)/sum(exp(3*i), exp(2*i),exp(1*i)) 3 w3 =1/3 =1/sum(3, 2, 1) =exp(1*i)/sum(exp(3*i), exp(2*i),exp(1*i))A simplified weight definition may be defined by multiplying theoriginal weight by the weight denominator (e.g., simplified weightdefinition=(original weight)×(weight denominator)) as shown in Table 4below. For example, the weight denominator in the “equal weights” columnof Table 3 above is three. Therefore, every entry in the “equal weights”column of Table 3 above is multiplied by three for the simplified weightdefinition (e.g., simplified weight definition=(equal weight)×(3)). Theresults are shown in the “equal weights” column of Table 4 below. Forexample, the weight denominator in the “ranking weights” column of Table3 above is sum(3,2,1). Therefore, every entry in the “ranking weights”column of Table 3 above is multiplied by sum(3,2,1) for the simplifiedweight definition (e.g., simplified weight definition=(equalweight)×sum(3,2,1)). The results are shown in the “ranking weights”column of Table 4 below. For example, the weight denominator in the“softmax weights scale=i” column of Table 3 above issum(exp(3*i),exp(2*i),exp(1*i)). Therefore, every entry in the “softmaxweights scale=i” column of Table 3 above is multiplied bysum(exp(3*i),exp(2*i),exp(1*i)) for the simplified weight definition(e.g., simplified weight definition=(equalweight)×sum(exp(3*i),exp(2*i),exp(1*i))). The results are shown in the“softmax weights scale=i” column of Table 4 below.

TABLE 4 equal ranking softmax weights ent id weights weights weightsscale = i 1 w1 =1 =3 =exp(3*i) 2 w2 =1 =2 =exp(2*i) 3 w3 =1 =1 =exp(1*i)Based on the above weights calculated by the data modifier 135 as shownin Tables 3 and 4, the example 3×4 matrix may be converted to 4 columns(e.g., the 100×16 matrix discussed above converted to 16 columns). The 4columns (a single row for each user) including a single value for eachembedding are represented in entries shown in Table 5 below.

TABLE 5 weighted weighted weighted weighted user_id emb 1 emb 2 emb 3emb 4 1 =(w1*1 + =(w1*2 + =(w1*3 + =(w1*4 + w2*5 + w2*6 + w2*7 + w2*8 +w3*9)/(w1 + w3*10)/(w1 + w3*11)/(w1 + w3*12)/(w1 + w2 + w3) w2 + w3)w2 + w3) w2 + w3)Additionally or alternatively, in some examples, the data modifier 135reduces the dimension of embeddings by using a single value to representthe different embedding dimensions. Reducing the multiple dimensions toa single value may be accomplished with one or more hyperparameters. Themultiple dimensions can be reduced to a single value by, for example,calculating the average of the embeddings (e.g., Average=average(1, 2,3, 4)), selecting the maximum embedding value (e.g., Maximum=max(1, 2,3, 4)), selecting the minimum value (e.g., Minimum=min(1, 2, 3, 4)),calculating the Manhattan distance (e.g., Manhattan Distance=sum(abs(1),abs(2), abs(3), abs(4))), calculating the Chebyshev distance (e.g.,Chebyshev Distance=max(abs(1), abs(2), abs(3), abs(4))), calculating theEuclidean distance (e.g., Euclidean Distance=sum[abs(1){circumflex over( )}2, abs(2){circumflex over ( )}2, abs(3){circumflex over ( )}2,abs(4){circumflex over ( )}2]{circumflex over ( )}(½)), and/orcalculating Minkowski distance (e.g., MinkowskiDistance=sum[abs(1){circumflex over ( )}3, abs(2){circumflex over ( )}3,abs(3){circumflex over ( )}3, abs(4){circumflex over ( )}3]{circumflexover ( )}(⅓)). Example equations to solve for one entity (e.g.,ent_id=1) for different hyperparameters are shown in Table 6 below.

TABLE 6 Hyperparameter Equation for ent_id = 1 Average =average(1, 2, 3,4) Maximum =max(1, 2, 3, 4) Minimum =sum(abs(1), abs(2), abs(3), abs(4))Manhattan Distance =sum(abs(1), abs(2), abs(3), abs(4)) ChebyshevDistance =max(abs(1), abs(2), abs(3), abs(4)) Euclidean Distance=sum[abs(1){circumflex over ( )}2, abs(2){circumflex over ( )}2,abs(3){circumflex over ( )}2, abs(4){circumflex over ( )}2]{circumflexover ( )}(1/2) Minkowski Distance =sum[abs(1){circumflex over ( )}3,abs(2){circumflex over ( )}3, abs(3){circumflex over ( )}3,abs(4){circumflex over ( )}3]{circumflex over ( )}(1/3)As a specific example, assuming the average of the embeddings is used,the 3×4 matrix represented in Table 1 may be converted to 3 columns(e.g., the 100×16 matrix discussed above converted to 100 columns). The3 columns (a single row of three elements for each user) include asingle value for each entity as represented in entries shown in Table 7below.

TABLE 7 user_id reduced emb 1 reduced emb 2 reduced emb 3 1 =average(1,2, =average(5, 6, =average(9, 10, 3, 4) 7, 8) 11, 12)

In some examples, the number of unique entities (e.g., search resultclicks and/or videos watched) represented in the covariates for aparticular user may be less than the total number of different entitiesthe database proprietor 102 provides. For example, the databaseproprietor 102 may provide the top 100 entities for each panelist.However, a particular user may only be associated with 87 differententities, thereby resulting in 13 null entities. When the number of nullentities is non-zero, it is possible to infer the number of differenttopics the particular user is interested in (e.g., corresponding to thenumber of different non-null entities associated with the user). In someexamples, assuming the database proprietor 102 provides the top 100entities for the particular user, the value of different topics theparticular user is interested in may be in a range [0,100]. Similarly,not all entities have embeddings (e.g., if a particular search resultclick entity and/or video watch entity has not been clicked/viewed by asufficient number of users) but be represented by a null struct or nullembeddings. The number of null embeddings can be used to infer how manyrare topics the particular user is interested in. In examples disclosedherein, a rare topic is an entity (e.g., topic) that includes noembeddings. In some examples, assuming the database proprietor 102provides the top 100 entities for the particular user, the value of raretopics the particular user is interested in may be in a range [0,100] orNULL (e.g., the particular user does not have any entities). In someexamples, the number of null entities (and/or the number of non-nullentities) and/or the number of null embeddings for a particular user mayserve as additional input features for the demographic correction modelgeneration process.

For example, as represented in Table 8 below, user 1 has a normalembedding array for entity 1, an array of nulls associated with entity 2(e.g., corresponding to 1 null embeddings), and no entity identified forentity 3 (e.g., corresponding to 1 null entity).

TABLE 8 user_id ent ent_id emb_0 emb_1 emb_2 emb_3 1 1 A 1 2 3 4 1 2 Bnull null null null 1 3In this example, two additional columns may be added to reflect the nullcounts as represented in Table 9 below.

TABLE 9 user_id count_null_entities count_null_embeddings 1 1 1

In some examples, a process is implemented to track differentdemographic correction model experiments over time to achieve highquality (e.g., accurate) models and also for auditing purposes.Accomplishing this objective within the context of the privacy-protectedcloud environment 106 presents several unique challenges because themodel features (e.g., inputs and hyperparameters) and model performance(e.g., accuracy) are stored separately to satisfy the privacyconstraints of the environment.

In some examples, a model analyzer 144 may implement and/or use one ormore demographic correction models to generate predictions and/orinferences as to the actual demographics (e.g., actual ages) of usersassociated with media impressions logged by the database proprietor 102.That is, in some examples, as shown in FIG. 1, the model analyzer 144uses one or more of the demographic correction models in the demographiccorrection models database 142 to analyze the impressions in theenriched impressions database 120 that were matched to a particular userof the database proprietor 102. The inferred demographic (e.g., age) foreach user may be stored in a model inferences database 146 forsubsequent use, retrieval, and/or analysis. Additionally oralternatively, in some examples, the model analyzer 144 uses one or moreof the demographic correction models in the demographic correctionmodels database 142 to analyze the entire user base of the databaseproprietor regardless of whether the users are matched to any particularmedia impressions. After inferring the correct demographic (e.g., age)for each user, the inferences are stored in the model inferencesdatabase 146. In some such examples, when the users matched toparticular impressions are to be analyzed (e.g., the users matched toimpressions in the enriched impressions database 120), the modelanalyzer 144 merely extracts the inferred demographic assignment to eachrelevant user in the enriched impressions database 120 that matches withone or more media impressions.

As described above, in some examples, the database proprietor 102 mayidentify a particular user as corresponding to a particular impressionbased on the user being signed into the database proprietor 102.However, there are circumstances where the individual corresponding tothe user account is not the actual person that was exposed to therelevant media. Accordingly, merely inferring a correct demographic(e.g., age) of the user associated with the signed in user account maynot be the correct demographic of the actual person to which aparticular media impression should be attributed. In other words,whereas the AME panelist data and the database proprietor impressionsdata is matched at the impression level, demographic correction isimplemented at the user level. Therefore, before generating thedemographic correction model, a method to reduce logged impressions toindividual users is first implemented so that the demographic correctionmodel can be reliably implemented.

With inferences made to correct inaccurate demographic information ofdatabase proprietor users (e.g., falsified self-declared ages) andstored in the model inferences database 146, the AME 104 may beinterested in extracting audience measurement metrics based on thecorrected data. However, as mentioned above, the data contained insidethe privacy-protected cloud environment 106 is subject to privacyconstraints. In some examples, the privacy constraints ensure that thedata can only be extracted for review and/or analysis in aggregate so asto protect the privacy of any particular individual represented in thedata (e.g., a panelist of the AME 104 and/or a registered user of thedatabase proprietor 102). Accordingly, in some examples, a dataaggregator 148 aggregates the audience measurement data associated withparticular media campaigns before the data is provided to an aggregatedcampaign data database 150 in the AME output data store 138 of the AMEproprietary cloud environment 126.

The data aggregator 148 may aggregate data in different ways fordifferent types of audience measurement metrics. For instance, at thehighest level, the aggregated data may provide the total impressioncount and total number of users (e.g., estimated audience size) exposedto the media 108 for a particular media campaign. As mentioned above,the total number of users reported by the data aggregator 148 is basedon the total number of unique user accounts matched to impressions butdoes not include the individuals associated with impressions that werenot matched to a particular user (e.g., non-coverage). However, thetotal number of unique user accounts does not account for the fact thata single individual may correspond to more than one user account (e.g.,multi-account users), and does not account for situations where a personother than a signed-in user was exposed to the media 108 (e.g.,misattribution). These errors in the aggregated data may be correctedbased on the adjustment factors stored in the adjustment factorsdatabase 136. Further, in some examples, the aggregated data may includean indication of the demographic composition of the users represented inthe aggregated data (e.g., number of males vs females, number of usersin different age brackets, etc.).

Additionally or alternatively, in some examples, the data aggregator 148may provide aggregated data that is associated with a particular aspectof a media campaign. For instance, the data may be aggregated based onparticular sites (e.g., all media impressions served on YouTube.com). Inother examples, the data may be aggregated based on placementinformation (e.g., aggregated based on particular primary content videosaccessed by users when the media advertisement was served). In otherexamples, the data may be aggregated based on device type (e.g.,impressions served via a desktop computer versus impressions served viaa mobile device). In other examples, the data may be aggregated based ona combination of one or more of the above factors and/or based on anyother relevant factor(s).

In some examples, the privacy constraints imposed on the data within theprivacy-protected cloud environment 106 include a limitation that datacannot be extracted (even when aggregated) for less than a thresholdnumber of individuals (e.g., 50 individuals). Accordingly, if theparticular metric being sought includes less than the threshold numberof individuals, the data aggregator 148 will not provide such data. Forinstance, if the threshold number of individuals is 50 but there areonly 46 females in the age range of 18-25 that were exposed toparticular media 108, the data aggregator 148 would not provide theaggregate data for females in the 18-25 age bracket. Such privacyconstraints can leave gaps in the audience measurement metrics,particularly in locations where the number of panelists is relativelysmall. Accordingly, in some examples, when audience measurement is notavailable for a particular demographic segment of interest in aparticular region (e.g., a particular country), the audience measurementmetrics in one or more comparable region(s) may be used to impute themetrics for the missing data in the first region of interest. In someexamples, the particular metrics imputed from comparable regions isbased on a comparison of audience metrics for which data is available inboth regions. For instance, while data for females in the 18-25 bracketmay be unavailable, assume that data for females in the 26-35 agebracket is available. The metrics associated with the 26-35 age bracketin the region of interests may be compared with metrics for the 26-35age bracket in other regions and the regions with the closest metrics tothe region of interest may be selected for use in calculating imputationfactor(s).

As shown in the illustrated example, both the adjustment factorsdatabase 136 and the aggregated campaigns data database 150 are includedwithin the AME output data store 138 of the AME proprietary cloudenvironment 126. As mentioned above, in some examples, the AMEproprietary cloud environment 126 is provided by the database proprietor102 and enables data to be provided to and retrieved from theprivacy-protected cloud environment. In some examples, the aggregatedcampaign data and the adjustment factors are subsequently transferred toa separate computing apparatus 152 of the AME 104 for analysis by anaudience metrics analyzer 154. In some examples, the separate computingapparatus may be omitted with its functionality provided by the AMEproprietary cloud environment 126. In other examples, the AMEproprietary cloud environment 126 may be omitted with the adjustmentfactors and the aggregated data provided directly to the computingapparatus 152. Further, in this example, the AME panel data database 122is within the AME first party data store 124, which is shown as beingseparate from the AME output data store 138. However, in other examples,the AME first party data store 124 and the AME output data store 138 maybe combined.

In the illustrated example of FIG. 1, the audience metrics analyzer 154applies the adjustment factors to the aggregated data to correct forerrors in the data including misattribution, non-coverage, andmulti-count users. The output of the audience metrics analyzer 154corresponds to the final calibrated data of the AME 104 and is stored ina final calibrated data database 156. In this example, the computingapparatus 152 also includes a report generator 158 to generate reportsbased on the final calibrated data.

While an example manner of implementing the privacy-protected cloudenvironment 106 of FIG. 1 is illustrated in FIG. 1, one or more of theelements, processes and/or devices illustrated in FIG. 1 may becombined, divided, re-arranged, omitted, eliminated and/or implementedin any other way. Further, the example campaign impressions database116, example matchable impressions database 118, the example enrichedcampaign impressions database 120, the example data matching analyzer128, the example AME intermediary merged data database 130, the exampleAME privacy-protected data store 132, the example adjustment factoranalyzer 134, the example data modifier 135, the example model generator140, the example demographic correction models database 142, the examplemodel analyzer 144, the example model inferences database 146, theexample data aggregator 148 and/or, more generally, the exampleprivacy-protected cloud environment 106 of FIG. 1 may be implemented byhardware, software, firmware and/or any combination of hardware,software and/or firmware. Thus, for example, any of the example campaignimpressions database 116, example matchable impressions database 118,the example enriched campaign impressions database 120, the example datamatching analyzer 128, the example AME intermediary merged data database130, the example AME privacy-protected data store 132, the exampleadjustment factor analyzer 134, the example data modifier 135, theexample model generator 140, the example demographic correction modelsdatabase 142, the example model analyzer 144, the example modelinferences database 146, the example data aggregator 148 and/or, moregenerally, the example privacy-protected cloud environment 106 could beimplemented by one or more analog or digital circuit(s), logic circuits,programmable processor(s), programmable controller(s), graphicsprocessing unit(s) (GPU(s)), digital signal processor(s) (DSP(s)),application specific integrated circuit(s) (ASIC(s)), programmable logicdevice(s) (PLD(s)) and/or field programmable logic device(s) (FPLD(s)).When reading any of the apparatus or system claims of this patent tocover a purely software and/or firmware implementation, at least one ofthe example campaign impressions database 116, example matchableimpressions database 118, the example enriched campaign impressionsdatabase 120, the example data matching analyzer 128, the example AMEintermediary merged data database 130, the example AME privacy-protecteddata store 132, the example adjustment factor analyzer 134, the exampledata modifier 135, the example model generator 140, the exampledemographic correction models database 142, the example model analyzer144, the example model inferences database 146, and/or the example dataaggregator 148 is/are hereby expressly defined to include anon-transitory computer readable storage device or storage disk such asa memory, a digital versatile disk (DVD), a compact disk (CD), a Blu-raydisk, etc. including the software and/or firmware. Further still, theexample privacy-protected cloud environment 106 of FIG. 1 may includeone or more elements, processes and/or devices in addition to, orinstead of, those illustrated in FIG. 1, and/or may include more thanone of any or all of the illustrated elements, processes and devices. Asused herein, the phrase “in communication,” including variationsthereof, encompasses direct communication and/or indirect communicationthrough one or more intermediary components, and does not require directphysical (e.g., wired) communication and/or constant communication, butrather additionally includes selective communication at periodicintervals, scheduled intervals, aperiodic intervals, and/or one-timeevents.

A flowchart representative of example hardware logic, machine readableinstructions, hardware implemented state machines, and/or anycombination thereof for implementing aspects of the privacy-protectedcloud environment 106 of FIG. 1 is shown in FIG. 2. The machine readableinstructions may be one or more executable programs or portion(s) of anexecutable program for execution by a computer processor and/orprocessor circuitry, such as the processor 312 shown in the exampleprocessor platform 300 discussed below in connection with FIG. 3. Theprogram may be embodied in software stored on a non-transitory computerreadable storage medium such as a CD-ROM, a floppy disk, a hard drive, aDVD, a Blu-ray disk, or a memory associated with the processor 312, butthe entire program and/or parts thereof could alternatively be executedby a device other than the processor 312 and/or embodied in firmware ordedicated hardware. Further, although the example program is describedwith reference to the flowcharts illustrated in FIG. 2, many othermethods of implementing the example privacy-protected cloud environment106 may alternatively be used. For example, the order of execution ofthe blocks may be changed, and/or some of the blocks described may bechanged, eliminated, or combined. Additionally or alternatively, any orall of the blocks may be implemented by one or more hardware circuits(e.g., discrete and/or integrated analog and/or digital circuitry, anFPGA, an ASIC, a comparator, an operational-amplifier (op-amp), a logiccircuit, etc.) structured to perform the corresponding operation withoutexecuting software or firmware. The processor circuitry may bedistributed in different network locations and/or local to one or moredevices (e.g., a multi-core processor in a single machine, multipleprocessors distributed across a server rack, etc.).

The machine readable instructions described herein may be stored in oneor more of a compressed format, an encrypted format, a fragmentedformat, a compiled format, an executable format, a packaged format, etc.Machine readable instructions as described herein may be stored as dataor a data structure (e.g., portions of instructions, code,representations of code, etc.) that may be utilized to create,manufacture, and/or produce machine executable instructions. Forexample, the machine readable instructions may be fragmented and storedon one or more storage devices and/or computing devices (e.g., servers)located at the same or different locations of a network or collection ofnetworks (e.g., in the cloud, in edge devices, etc.). The machinereadable instructions may require one or more of installation,modification, adaptation, updating, combining, supplementing,configuring, decryption, decompression, unpacking, distribution,reassignment, compilation, etc. in order to make them directly readable,interpretable, and/or executable by a computing device and/or othermachine. For example, the machine readable instructions may be stored inmultiple parts, which are individually compressed, encrypted, and storedon separate computing devices, wherein the parts when decrypted,decompressed, and combined form a set of executable instructions thatimplement one or more functions that may together form a program such asthat described herein.

In another example, the machine readable instructions may be stored in astate in which they may be read by processor circuitry, but requireaddition of a library (e.g., a dynamic link library (DLL)), a softwaredevelopment kit (SDK), an application programming interface (API), etc.in order to execute the instructions on a particular computing device orother device. In another example, the machine readable instructions mayneed to be configured (e.g., settings stored, data input, networkaddresses recorded, etc.) before the machine readable instructionsand/or the corresponding program(s) can be executed in whole or in part.Thus, machine readable media, as used herein, may include machinereadable instructions and/or program(s) regardless of the particularformat or state of the machine readable instructions and/or program(s)when stored or otherwise at rest or in transit.

The machine readable instructions described herein can be represented byany past, present, or future instruction language, scripting language,programming language, etc. For example, the machine readableinstructions may be represented using any of the following languages: C,C++, Java, C#, Perl, Python, JavaScript, HyperText Markup Language(HTML), Structured Query Language (SQL), Swift, etc.

As mentioned above, the example processes of FIG. 2 may be implementedusing executable instructions (e.g., computer and/or machine readableinstructions) stored on a non-transitory computer and/or machinereadable medium such as a hard disk drive, a flash memory, a read-onlymemory, a compact disk, a digital versatile disk, a cache, arandom-access memory and/or any other storage device or storage disk inwhich information is stored for any duration (e.g., for extended timeperiods, permanently, for brief instances, for temporarily buffering,and/or for caching of the information). As used herein, the termnon-transitory computer readable medium is expressly defined to includeany type of computer readable storage device and/or storage disk and toexclude propagating signals and to exclude transmission media.

“Including” and “comprising” (and all forms and tenses thereof) are usedherein to be open ended terms. Thus, whenever a claim employs any formof “include” or “comprise” (e.g., comprises, includes, comprising,including, having, etc.) as a preamble or within a claim recitation ofany kind, it is to be understood that additional elements, terms, etc.may be present without falling outside the scope of the correspondingclaim or recitation. As used herein, when the phrase “at least” is usedas the transition term in, for example, a preamble of a claim, it isopen-ended in the same manner as the term “comprising” and “including”are open ended. The term “and/or” when used, for example, in a form suchas A, B, and/or C refers to any combination or subset of A, B, C such as(1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) Bwith C, and (7) A with B and with C. As used herein in the context ofdescribing structures, components, items, objects and/or things, thephrase “at least one of A and B” is intended to refer to implementationsincluding any of (1) at least one A, (2) at least one B, and (3) atleast one A and at least one B. Similarly, as used herein in the contextof describing structures, components, items, objects and/or things, thephrase “at least one of A or B” is intended to refer to implementationsincluding any of (1) at least one A, (2) at least one B, and (3) atleast one A and at least one B. As used herein in the context ofdescribing the performance or execution of processes, instructions,actions, activities and/or steps, the phrase “at least one of A and B”is intended to refer to implementations including any of (1) at leastone A, (2) at least one B, and (3) at least one A and at least one B.Similarly, as used herein in the context of describing the performanceor execution of processes, instructions, actions, activities and/orsteps, the phrase “at least one of A or B” is intended to refer toimplementations including any of (1) at least one A, (2) at least one B,and (3) at least one A and at least one B.

As used herein, singular references (e.g., “a”, “an”, “first”, “second”,etc.) do not exclude a plurality. The term “a” or “an” item, as usedherein, refers to one or more of that item. The terms “a” (or “an”),“one or more”, and “at least one” can be used interchangeably herein.Furthermore, although individually listed, a plurality of means,elements or method actions may be implemented by, e.g., a single unit orprocessor. Additionally, although individual features may be included indifferent examples or claims, these may possibly be combined, and theinclusion in different examples or claims does not imply that acombination of features is not feasible and/or advantageous.

FIG. 2 is a flowchart representative of an example machine readableinstructions 200 that may be executed by a processor (e.g., theprocessor 312 of FIG. 3) to implement the example data modifier 135, theexample model generator 140, and/or the example model analyzer 144 ofFIG. 1 to reduce the dimensionality of a matrix associated with entitiesand embeddings. The example instructions 200 of FIG. 2 begin at block202, at which the data modifier 135 obtains a first matrix includingdata associated with entities and embeddings. The first matrix maycorrespond to a user and may be obtained from the AME intermediarymerged data database 130 (FIG. 1). For example, the first matrix may bea 100×16 matrix including 100 entities (e.g., search result entitiesand/or video watch entities) and 16 dimension embeddings (e.g.,classifications of searches entered and/or videos viewed by a user) foreach entity.

At block 204, the example data modifier 135 generates a second matrix byreducing the data in the first matrix to second data. For example, thedata in the first matrix may be too much data for the model generator140 (FIG. 1) to process while generating a demographic correction model.Therefore, the second matrix is generated as a more manageable size tobe used as an input feature for the model generator 140 to generate thedemographic correction model. The generation of the second matrix may bebased on performing a reduction technique. In some examples, the datamodifier 135 implements the reduction technique of block 204 byselecting the top m entities (e.g., knowledge graph entities) and thetop n embeddings and converting the 2-dimensional matrix to a1-dimensional array (e.g., described in connection to Tables 1 and 2above). In other examples, the data modifier 135 implements thereduction technique of block 204 by calculating weighted averages of theembeddings associated with the entities (e.g., as described inconnection to Tables 3-5 above). In yet other examples, the datamodifier 135 implements the reduction technique of block 204 bycalculating values of an average, a Manhattan distance, a Chebyshevdistance, a Euclidean distance, or a Minkowski distance of theembeddings associated with the entities (e.g., as described inconnection to Tables 6 and 7 above).

At block 206, the example data modifier 135 stores the second matrix inmemory as an input feature. For example, the example data modifier 135stores the second matrix in memory when the second matrix satisfies thesize to be used as an input feature by the model generator 140 togenerate one or more demographic correction models. At block 208, theexample model generator 140 generates a demographic correction modelbased on the second matrix as the input feature. Alternatively, thedemographic correction model may be generated based on a plurality ofinput features. In such examples, the second matrix may be one ofseveral input features. For example, the model generator 140 maygenerate the demographic correction model by utilizing the one or moreinput features to train and validate the demographic correction model.The demographic correction model may be utilized to correct demographicinformation from the database proprietor 102 associated withimpressions.

At block 210, the example model generator 140 stores the demographiccorrection model in the example demographic correction models database142. For example, the example model generator 140 may store a pluralityof demographic correction models in the demographic correction modelsdatabase 142 for different demographics. In some examples, a firstdemographic correction model corrects age from the database proprietor102 associated with impressions. In other examples, a second demographiccorrection model corrects gender from the database proprietor 102associated with impressions. The example model analyzer 144 (FIG. 1) canaccess different ones of the demographic correction models from thedemographic correction models database 142 based on the type ofdemographic information that is to be corrected.

At block 212, the example model analyzer 144 applies the demographiccorrection model from the demographic correction models database 142 topredict demographics associated with impressions. In some examples, amedia impression is already associated with demographics based on a usersigned into the database proprietor 102. However, there are cases wherethe demographics reported by the user are incorrect. For example, a usermay self-report his age as a male and age between 30 to 34, whereas theactual demographics of the user are a male of an age between 40 to 44.Therefore, the demographics (e.g., of a signed-in subscriber of thedatabase proprietor 102) do not correspond to the actual user exposed tothe media. The actual demographics may be predicted by applying one ormore demographic correction models. For example, the model analyzer 144applies the one or more demographic correction models to predictdemographics of the signed-in subscriber of the database proprietor 102to correct computer-generated error such as self-reported demographicscorresponding to a subscriber of a user account associated with one ormore impressions in the example enriched media campaign impressionsdatabase 120. Additionally, in some examples the demographic correctionmodels may predict the actual demographics for circumstances where theindividual corresponding to the user account is not the actual userexposed to the relevant media associated with the media impression(e.g., if another user such as a family member or friend isborrowing/using the device).

At block 214, the example model analyzer 144 stores the resulting datagenerated by the demographic correction model in the example modelinferences database 146 (FIG. 1). For example, the model analyzer 144stores the predicted demographics of a signed-in subscriber of thedatabase proprietor 102 associated with impressions in the modelinferences database 146 so that correct demographics (e.g., thepredicted demographics) corresponding to the impressions can becorrectly aggregated.

At block 216, the example data modifier 135 determines whether there isanother matrix to reduce. In one example, the data modifier 135 analyzesthe AME intermediary merged data database 130 to determine whether thereis another matrix corresponding to a different user. If the datamodifier 135 determines there is another matrix to reduce (e.g., block216 returns a result of “YES”), the data modifier 135 returns to block210. If the data modifier 135 determines there is another matrix toreduce (e.g., block 216 returns a result of “NO”), the exampleinstructions 200 of FIG. 2 terminate.

FIG. 3 is a block diagram of an example processor platform 300structured to execute the instructions of FIG. 2 to implement theprivacy-protected cloud environment 109 of FIG. 1. The processorplatform 300 can be, for example, a server, a personal computer, aworkstation, a self-learning machine (e.g., a neural network), or anyother type of computing device.

The processor platform 300 of the illustrated example includes aprocessor 312. The processor 312 of the illustrated example is hardware.For example, the processor 312 can be implemented by one or moreintegrated circuits, logic circuits, microprocessors, GPUs, DSPs, orcontrollers from any desired family or manufacturer. The hardwareprocessor may be a semiconductor based (e.g., silicon based) device. Inthis example, the processor implements the example data matchinganalyzer 128, the example AME privacy-protected data store 132, theexample adjustment factor analyzer 134, the example data modifier 135,the example model generator 140, the example model analyzer 144, and/orthe example data aggregator 148.

The processor 312 of the illustrated example includes a local memory 313(e.g., a cache). The processor 312 of the illustrated example is incommunication with a main memory including a volatile memory 314 and anon-volatile memory 316 via a bus 318. The volatile memory 314 may beimplemented by Synchronous Dynamic Random Access Memory (SDRAM), DynamicRandom Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory(RDRAM®) and/or any other type of random access memory device. Thenon-volatile memory 316 may be implemented by flash memory and/or anyother desired type of memory device. Access to the main memory 314, 316is controlled by a memory controller.

The processor platform 300 of the illustrated example also includes aninterface circuit 320. The interface circuit 320 may be implemented byany type of interface standard, such as an Ethernet interface, auniversal serial bus (USB), a Bluetooth® interface, a near fieldcommunication (NFC) interface, and/or a PCI express interface.

In the illustrated example, one or more input devices 322 are connectedto the interface circuit 320. The input device(s) 322 permit(s) a userto enter data and/or commands into the processor 312. The inputdevice(s) can be implemented by, for example, an audio sensor, amicrophone, a camera (still or video), a keyboard, a button, a mouse, atouchscreen, a track-pad, a trackball, isopoint and/or a voicerecognition system.

One or more output devices 324 are also connected to the interfacecircuit 320 of the illustrated example. The output devices 324 can beimplemented, for example, by display devices (e.g., a light emittingdiode (LED), an organic light emitting diode (OLED), a liquid crystaldisplay (LCD), a cathode ray tube display (CRT), an in-place switching(IPS) display, a touchscreen, etc.), a tactile output device, a printerand/or speaker. The interface circuit 320 of the illustrated example,thus, typically includes a graphics driver card, a graphics driver chipand/or a graphics driver processor.

The interface circuit 320 of the illustrated example also includes acommunication device such as a transmitter, a receiver, a transceiver, amodem, a residential gateway, a wireless access point, and/or a networkinterface to facilitate exchange of data with external machines (e.g.,computing devices of any kind) via a network 326. The communication canbe via, for example, an Ethernet connection, a digital subscriber line(DSL) connection, a telephone line connection, a coaxial cable system, asatellite system, a line-of-site wireless system, a cellular telephonesystem, etc.

The processor platform 300 of the illustrated example also includes oneor more mass storage devices 328 for storing software and/or data.Examples of such mass storage devices 328 include floppy disk drives,hard drive disks, compact disk drives, Blu-ray disk drives, redundantarray of independent disks (RAID) systems, and digital versatile disk(DVD) drives.

The machine executable instructions 332 of FIG. 2 may be stored in themass storage device 328, in the volatile memory 314, in the non-volatilememory 316, and/or on a removable non-transitory computer readablestorage medium such as a CD or DVD.

From the foregoing, it will be appreciated that example methods,apparatus and articles of manufacture have been disclosed that enablethe generation of accurate and reliable audience measurement metrics forInternet-based media without the use of third-party cookies and/or tagsthat have been the standard approach for monitoring Internet media formany years. This is accomplished by merging AME panel data with databaseproprietor impressions data within a privacy-protected cloud basedenvironment. The nature of the cloud environment and the privacyconstraints imposed thereon as well as the nature in which the databaseproprietor collects the database proprietor impression data presenttechnological challenges contributing to limitations in the reliabilityand/or completeness of the data. However, examples disclosed hereinovercome these difficulties by generating adjustment factors and/ormachine learning models based on the AME panel data.

Although certain example methods, apparatus and articles of manufacturehave been disclosed herein, the scope of coverage of this patent is notlimited thereto. On the contrary, this patent covers all methods,apparatus and articles of manufacture fairly falling within the scope ofthe claims of this patent.

Example methods and apparatus to generate audience metrics usingthird-party privacy-protected cloud environments are disclosed herein.Further examples and combinations thereof include the following:

Example 1 includes a non-transitory computer readable medium includinginstructions that when executed cause at least one processor to obtain afirst matrix, the first matrix including first data indicative ofentities and embeddings, the entities representative of at least one ofsearch result clicks or videos watched, the embeddings representative ofat least one of first classifications of the search result clicks orsecond classifications of the videos watched, generate a second matrixby reducing the first data in the first matrix to second data thatsatisfies a size corresponding to an input feature, store the secondmatrix in first memory as the input feature, and generate a demographiccorrection model based on the second matrix as the input feature, thedemographic correction model to correct demographics corresponding toimpressions logged in second memory.

Example 2 includes the non-transitory computer readable medium ofexample 1, wherein the at least one processor is to generate the secondmatrix based on performing a reduction technique, the at least oneprocessor to perform the reduction technique by selecting a first entityfrom the entities and first and second embeddings from the embeddings,and generating the second matrix to include a first entry and a secondentry, the first entry storing a first value of the first embeddingassociated with the first entity, the second entry storing a secondvalue of the second embedding associated with the first entity.

Example 3 includes the non-transitory computer readable medium ofexample 1, wherein the at least one processor is to generate the secondmatrix based on performing a reduction technique, the at least oneprocessor to perform the reduction technique by calculating weightedaverages of the embeddings associated with the entities, the weightedaverages including a first weighted average based on a first entity fromthe entities and ones of the embeddings, and generating the secondmatrix to include an entry storing a value of the first weighted averageassociated with the first entity.

Example 4 includes the non-transitory computer readable medium ofexample 1, wherein the at least one processor is to generate the secondmatrix based on performing a reduction technique, the at least oneprocessor to perform the reduction technique by calculating values of anaverage, a Manhattan distance, a Chebyshev distance, a Euclideandistance, or a Minkowski distance of the embeddings associated with theentities, the values including a first value based on a first entityfrom the entities and ones of the embeddings, and generating the secondmatrix to include an entry storing the first value associated with thefirst entity.

Example 5 includes the non-transitory computer readable medium ofexample 1, wherein the at least one processor is to generate the secondmatrix based on performing a reduction technique, the at least oneprocessor to perform the reduction technique by selecting at least oneof maximum values or minimum values of the embeddings associated withthe entities, the at least one of the maximum values or the minimumvalues including a first value based on a first entity from the entitiesand ones of the embeddings, and generating the second matrix to includean entry storing the first value associated with the first entity.

Example 6 includes the non-transitory computer readable medium ofexample 1, wherein the data corresponds to a user access to media, themedia associated with the entities and the embeddings.

Example 7 includes the non-transitory computer readable medium ofexample 1, wherein the entities include at least one of top searchresult click entities or video watch entities.

Example 8 includes the non-transitory computer readable medium ofexample 1, wherein the embeddings include classifications of at leastone of Internet searches requested by a user or media accessed by theuser.

Example 9 includes the non-transitory computer readable medium ofexample 1, wherein the entities are represented using integeridentifiers that map to a knowledge graph.

Example 10 includes the non-transitory computer readable medium ofexample 1, wherein the embeddings are represented as a numericalrepresentation of a class of at least one of objects, images, or words.

Example 11 includes an apparatus including a data modifier to obtain afirst matrix, the first matrix including first data indicative ofentities and embeddings, the entities representative of at least one ofsearch result clicks or videos watched, the embeddings representative ofat least one of first classifications of the search result clicks orsecond classifications of the videos watched, generate a second matrixby reducing the first data in the first matrix to second data thatsatisfies a size corresponding to an input feature, and store the secondmatrix in first memory as the input feature, and a model generator togenerate a demographic correction model based on the second matrix asthe input feature, the demographic correction model to correctdemographics corresponding to impressions logged in second memory.

Example 12 includes the apparatus of example 11, wherein the datamodifier is to generate the second matrix based on performing areduction technique, the data modifier to perform the reductiontechnique by selecting a first entity from the entities and first andsecond embeddings from the embeddings, and generating the second matrixto include a first entry and a second entry, the first entry storing afirst value of the first embedding associated with the first entity, thesecond entry storing a second value of the second embedding associatedwith the first entity.

Example 13 includes the apparatus of example 11, wherein the datamodifier is to generate the second matrix based on performing areduction technique, the data modifier to perform the reductiontechnique by calculating weighted averages of the embeddings associatedwith the entities, the weighted averages including a first weightedaverage based on a first entity from the entities and ones of theembeddings, and generating the second matrix to include an entry storinga value of the first weighted average associated with the first entity.

Example 14 includes the apparatus of example 11, wherein the datamodifier is to generate the second matrix is based on performing areduction technique, the data modifier to perform the reductiontechnique by calculating values of an average, a Manhattan distance, aChebyshev distance, a Euclidean distance, or a Minkowski distance of theembeddings associated with the entities, the values including a firstvalue based on a first entity from the entities and ones of theembeddings, and generating the second matrix to include an entry storingthe first value associated with the first entity.

Example 15 includes the apparatus of example 11, wherein the datamodifier is to generate the second matrix based on performing areduction technique, the data modifier to perform the reductiontechnique by selecting at least one of maximum values or minimum valuesof the embeddings associated with the entities, the at least one of themaximum values or the minimum values including a first value based on afirst entity from the entities and ones of the embeddings, andgenerating the second matrix to include an entry storing the first valueassociated with the first entity.

Example 16 includes the apparatus of example 11, wherein the datacorresponds to a user access to media, the media associated with theentities and the embeddings.

Example 17 includes the apparatus of example 11, wherein the entitiesinclude at least one of top search result click entities or video watchentities.

Example 18 includes the apparatus of example 11, wherein the embeddingsinclude classifications of at least one of Internet searches requestedby a user or media accessed by the user.

Example 19 includes the apparatus of example 11, wherein the entitiesare represented using integer identifiers that map to a knowledge graph.

Example 20 includes the apparatus of example 11, wherein the embeddingsare represented as a numerical representation of a class of at least oneof objects, images, or words.

Example 21 includes an apparatus including at least one memory,instructions, and at least one processor to execute the instructions toat least obtain a first matrix, the first matrix including first dataindicative of entities and embeddings, the entities representative of atleast one of search result clicks or videos watched, the embeddingsrepresentative of at least one of first classifications of the searchresult clicks or second classifications of the videos watched, generatea second matrix by reducing the first data in the first matrix to seconddata that satisfies a size corresponding to an input feature, store thesecond matrix in first memory as the input feature, and generate ademographic correction model based on the second matrix as the inputfeature, the demographic correction model to correct demographicscorresponding to impressions logged in second memory.

Example 22 includes the apparatus of example 21, wherein the at leastone processor is to generate the second matrix based on performing areduction technique, the at least one processor to perform the reductiontechnique by selecting a first entity from the entities and first andsecond embeddings from the embeddings, and generating the second matrixto include a first entry and a second entry, the first entry storing afirst value of the first embedding associated with the first entity, thesecond entry storing a second value of the second embedding associatedwith the first entity.

Example 23 includes the apparatus of example 21, wherein the at leastone processor is to generate the second matrix based on performing areduction technique, the at least one processor to perform the reductiontechnique by calculating weighted averages of the embeddings associatedwith the entities, the weighted averages including a first weightedaverage based on a first entity from the entities and ones of theembeddings, and generating the second matrix to include an entry storinga value of the first weighted average associated with the first entity.

Example 24 includes the apparatus of example 21, wherein the at leastone processor is to generate the second matrix based on performing areduction technique, the at least one processor to perform the reductiontechnique by calculating values of an average, a Manhattan distance, aChebyshev distance, a Euclidean distance, or a Minkowski distance of theembeddings associated with the entities, the values including a firstvalue based on a first entity from the entities and ones of theembeddings, and generating the second matrix to include an entry storingthe first value associated with the first entity.

Example 25 includes the apparatus of example 21, wherein the at leastone processor is to generate the second matrix based on performing areduction technique, the at least one processor to perform the reductiontechnique by selecting at least one of maximum values or minimum valuesof the embeddings associated with the entities, the at least one of themaximum values or the minimum values including a first value based on afirst entity from the entities and ones of the embeddings, andgenerating the second matrix to include an entry storing the first valueassociated with the first entity.

Example 26 includes the apparatus of example 21, wherein the datacorresponds to a user access to media, the media associated with theentities and the embeddings.

Example 27 includes the apparatus of example 21, wherein the entitiesinclude at least one of top search result click entities or video watchentities.

Example 28 includes the apparatus of example 21, wherein the embeddingsinclude classifications of at least one of Internet searches requestedby a user or media accessed by the user.

Example 29 includes the apparatus of example 21, wherein the entitiesare represented using integer identifiers that map to a knowledge graph.

Example 30 includes the apparatus of example 21, wherein the embeddingsare represented as a numerical representation of a class of at least oneof objects, images, or words.

Example 31 includes a method including obtaining, by executing aninstruction with a processor, a first matrix, the first matrix includingfirst data indicative of entities and embeddings, the entitiesrepresentative of at least one of search result clicks or videoswatched, the embeddings representative of at least one of firstclassifications of the search result clicks or second classifications ofthe videos watched, generating, by executing an instruction with theprocessor, a second matrix by reducing the first data in the firstmatrix to second data that satisfies a size corresponding to an inputfeature, storing, by executing an instruction with the processor, thesecond matrix in first memory as the input feature, and generating, byexecuting an instruction with the processor, a demographic correctionmodel based on the second matrix as the input feature, the demographiccorrection model to correct demographics corresponding to impressionslogged in second memory.

Example 32 includes the method of example 31, wherein the generating ofthe second matrix is based on performing a reduction technique, theperforming of the reduction technique including selecting a first entityfrom the entities and first and second embeddings from the embeddings,and generating the second matrix to include a first entry and a secondentry, the first entry storing a first value of the first embeddingassociated with the first entity, the second entry storing a secondvalue of the second embedding associated with the first entity.

Example 33 includes the method of example 31, wherein the generating ofthe second matrix is based on performing a reduction technique, theperforming of the reduction technique including calculating weightedaverages of the embeddings associated with the entities, the weightedaverages including a first weighted average based on a first entity fromthe entities and ones of the embeddings, and generating the secondmatrix to include an entry storing a value of the first weighted averageassociated with the first entity.

Example 34 includes the method of example 31, wherein the generating ofthe second matrix is based on performing a reduction technique, theperforming of the reduction technique including calculating values of anaverage, a Manhattan distance, a Chebyshev distance, a Euclideandistance, or a Minkowski distance of the embeddings associated with theentities, the values including a first value based on a first entityfrom the entities and ones of the embeddings, and generating the secondmatrix to include an entry storing the first value associated with thefirst entity.

Example 35 includes the method of example 31, wherein the generating ofthe second matrix is based on performing a reduction technique, theperforming of the reduction technique including selecting at least oneof maximum values or minimum values of the embeddings associated withthe entities, the at least one of the maximum values or the minimumvalues including a first value based on a first entity from the entitiesand ones of the embeddings, and generating the second matrix to includean entry storing the first value associated with the first entity.

Example 36 includes the method of example 31, wherein the datacorresponds to a user access to media, the media associated with theentities and the embeddings.

Example 37 includes the method of example 31, wherein the entitiesinclude at least one of top search result click entities or video watchentities.

Example 38 includes the method of example 31, wherein the embeddingsinclude classifications of at least one of Internet searches requestedby a user or media accessed by the user.

Example 39 includes the method of example 31, wherein the entities arerepresented using integer identifiers that map to a knowledge graph.

Example 40 includes the method of example 31, wherein the embeddings arerepresented as a numerical representation of a class of at least one ofobjects, images, or words.

The following claims are hereby incorporated into this DetailedDescription by this reference, with each claim standing on its own as aseparate embodiment of the present disclosure.

1. A non-transitory computer readable medium comprising instructionsthat when executed cause at least one processor to: obtain a firstmatrix, the first matrix including first data indicative of entities andembeddings, the entities representative of at least one of search resultclicks or videos watched, the embeddings representative of at least oneof first classifications of the search result clicks or secondclassifications of the videos watched; generate a second matrix byreducing the first data in the first matrix to second data thatsatisfies a size corresponding to an input feature; store the secondmatrix in first memory as the input feature; and generate a demographiccorrection model based on the second matrix as the input feature, thedemographic correction model to correct demographics corresponding toimpressions logged in second memory.
 2. The non-transitory computerreadable medium of claim 1, wherein the at least one processor is togenerate the second matrix based on performing a reduction technique,the at least one processor to perform the reduction technique by:selecting a first entity from the entities and first and secondembeddings from the embeddings; and generating the second matrix toinclude a first entry and a second entry, the first entry storing afirst value of the first embedding associated with the first entity, thesecond entry storing a second value of the second embedding associatedwith the first entity.
 3. The non-transitory computer readable medium ofclaim 1, wherein the at least one processor is to generate the secondmatrix based on performing a reduction technique, the at least oneprocessor to perform the reduction technique by: calculating weightedaverages of the embeddings associated with the entities, the weightedaverages including a first weighted average based on a first entity fromthe entities and ones of the embeddings; and generating the secondmatrix to include an entry storing a value of the first weighted averageassociated with the first entity.
 4. The non-transitory computerreadable medium of claim 1, wherein the at least one processor is togenerate the second matrix based on performing a reduction technique,the at least one processor to perform the reduction technique by:calculating values of an average, a Manhattan distance, a Chebyshevdistance, a Euclidean distance, or a Minkowski distance of theembeddings associated with the entities, the values including a firstvalue based on a first entity from the entities and ones of theembeddings; and generating the second matrix to include an entry storingthe first value associated with the first entity.
 5. The non-transitorycomputer readable medium of claim 1, wherein the at least one processoris to generate the second matrix based on performing a reductiontechnique, the at least one processor to perform the reduction techniqueby: selecting at least one of maximum values or minimum values of theembeddings associated with the entities, the at least one of the maximumvalues or the minimum values including a first value based on a firstentity from the entities and ones of the embeddings; and generating thesecond matrix to include an entry storing the first value associatedwith the first entity.
 6. The non-transitory computer readable medium ofclaim 1, wherein the data corresponds to a user access to media, themedia associated with the entities and the embeddings.
 7. Thenon-transitory computer readable medium of claim 1, wherein the entitiesinclude at least one of top search result click entities or video watchentities.
 8. The non-transitory computer readable medium of claim 1,wherein the embeddings include classifications of at least one ofInternet searches requested by a user or media accessed by the user. 9.The non-transitory computer readable medium of claim 1, wherein theentities are represented using integer identifiers that map to aknowledge graph.
 10. The non-transitory computer readable medium ofclaim 1, wherein the embeddings are represented as a numericalrepresentation of a class of at least one of objects, images, or words.11. An apparatus comprising: a data modifier to: obtain a first matrix,the first matrix including first data indicative of entities andembeddings, the entities representative of at least one of search resultclicks or videos watched, the embeddings representative of at least oneof first classifications of the search result clicks or secondclassifications of the videos watched; generate a second matrix byreducing the first data in the first matrix to second data thatsatisfies a size corresponding to an input feature; and store the secondmatrix in first memory as the input feature; and a model generator togenerate a demographic correction model based on the second matrix asthe input feature, the demographic correction model to correctdemographics corresponding to impressions logged in second memory. 12.The apparatus of claim 11, wherein the data modifier is to generate thesecond matrix based on performing a reduction technique, the datamodifier to perform the reduction technique by: selecting a first entityfrom the entities and first and second embeddings from the embeddings;and generating the second matrix to include a first entry and a secondentry, the first entry storing a first value of the first embeddingassociated with the first entity, the second entry storing a secondvalue of the second embedding associated with the first entity.
 13. Theapparatus of claim 11, wherein the data modifier is to generate thesecond matrix based on performing a reduction technique, the datamodifier to perform the reduction technique by: calculating weightedaverages of the embeddings associated with the entities, the weightedaverages including a first weighted average based on a first entity fromthe entities and ones of the embeddings; and generating the secondmatrix to include an entry storing a value of the first weighted averageassociated with the first entity.
 14. The apparatus of claim 11, whereinthe data modifier is to generate the second matrix is based onperforming a reduction technique, the data modifier to perform thereduction technique by: calculating values of an average, a Manhattandistance, a Chebyshev distance, a Euclidean distance, or a Minkowskidistance of the embeddings associated with the entities, the valuesincluding a first value based on a first entity from the entities andones of the embeddings; and generating the second matrix to include anentry storing the first value associated with the first entity.
 15. Theapparatus of claim 11, wherein the data modifier is to generate thesecond matrix based on performing a reduction technique, the datamodifier to perform the reduction technique by: selecting at least oneof maximum values or minimum values of the embeddings associated withthe entities, the at least one of the maximum values or the minimumvalues including a first value based on a first entity from the entitiesand ones of the embeddings; and generating the second matrix to includean entry storing the first value associated with the first entity. 16.The apparatus of claim 11, wherein the data corresponds to a user accessto media, the media associated with the entities and the embeddings. 17.The apparatus of claim 11, wherein the entities include at least one oftop search result click entities or video watch entities.
 18. Theapparatus of claim 11, wherein the embeddings include classifications ofat least one of Internet searches requested by a user or media accessedby the user.
 19. The apparatus of claim 11, wherein the entities arerepresented using integer identifiers that map to a knowledge graph. 20.The apparatus of claim 11, wherein the embeddings are represented as anumerical representation of a class of at least one of objects, images,or words. 21-40. (canceled)